Visualization  Methods  for  the  Exploration  of  High 

Dimensional  Data 


Final  Report 


Edward  J.  Wegman 

19951130  005 

February  16, 1995 


U.  S.  Army  Research  Office 
DAAL03-91-G-0039 

George  Mason  University 
Center  for  Computational  Statistics 


APPROVED  FOR  PUBLIC  RELEASE; 
DISTRIBUTION  UNLIMITED 

OTICQTJMJTY™®®®®® 


Introduction 


This  project  focused  on  the  development  of  data  analytic  and  statistical  methodology  for  data  sets 
which  m^  be  characterized  by  one  or  more  of  the  properties  that  they  are  large  in  size,  high  in  dimension  and 
nonhomogeneous.  A  major  thrust  is  in  the  visualization  of  both  data  point  clouds  and  mathematical  structures 
in  high  dimensions.  Several  techniques  were  pr(q)osed  including  parallel  coordinate  density  plots,  3- 
dimensional  Andrews  plots,  grand  (or  guided)  tours  in  3  or  higher  dimensions.  A  combination  of 
mathematicat  analysis  and  graphics  display  is  our  basic  approach  to  these  visualization  problems.  A  closely 
related  area  is  the  area  of  strucmiral  inference  for  high  dimensional  structures.  By  this  is  meant  the  estimation 
of  solid  structures  including  k-dimensional  flats  in  n-dimensional  space  as  well  as  other  nonlinear  manifolds  in 
high  dimensional  qrace.  Proposed  techniques  involved  1.  the  detection  and  estimation  of  k-flats,  thick  k-flats 
and  nonlinear  manifolds  of  modest  curvature  by  exploitation  of  the  projective  duality  for  parallel  coordinates 
and  2.  the  estimation  of  more  severely  curved  manifolds  by  use  of  ridges  on  k-dimensional  density  estimates. 
The  parallel  coordinate  projective  duality  is  that  in  parallel  coordinates  lines  are  represented  by  points  and  vice 
versa.  Since  k  linearly  ind^ndent  lii^  are  sufficient  to  uniquely  specify  a  k-flat,  it  sqqreared  to  be  possible  to 
identify  and  arbitrarily  oriented  k-flat  in  n-space  by  appropriately  exploiting  parallel  coordinates. 

We  proposed  to  focus  on  several  aspects  of  computational  statistics.  The  main  focus  was  the 
develt^moit  of  methods  for  the  visualization  of  multidimensioiial  stnKtnre.  The  visualization  of 
multidimensional  structure  is  a  key  dement  in  exploratory  analysis  of  high  dimensional  data,  but,  of  course, 
with  much  broader  spinoff  in  terms  of  dher  scientific  areas.  We  suggested  four  research  topics  related  to  the 
visualization;  1.  Three-Dimensional  Andrews  and  Related  Plots,  2.  The  Grand  Tour  in  Three  Dimensions,  3. 
Finding  Structure  in  k-Dimensions  Using  Grand  Tour  and  Parallel  Coordinates,  and  4.  Structural  Inference 
using  Ridge  Estimation  in  Ifyperspace. 


Three-Dimensioiud  Andrews  and  Related  Plots 

An  Andrews  plot  is  a  multidimensional  plotting  device  that  is  somewhat  related  to  the  parallel 
coordinate  methodology.  There  are  several  concei^ual  viewpmnts  that  can  be  described  in  connection  with 

Andrews  plots.  First  of  all  think  <rf  a  data  vector  (Xi,. . .  ,X„)  as  represented  by  pairs  of  the  form  (1,  Xj ) . 

(n,  Xn).  One  way  think  of  the  parallel  coordinate  plot  is  as  a  linear  interpolatkm  between  these  points.  The 
reason  for  using  a  linear  interpofetion  is  that  the  tranrformation  fix>m  Cartesian  space  to  parallel  coordinate 
space  is  a  prqective  transformation  and,  thus,  leads  to  an  elegairt  geometric  interpretation  of  mathematical 
structure.  In  particular,  we  can  nuq>  Cartesian  geometric  structures  into  parallel  coordinate  geometric 
structures.  However,  other  general  sets  of  interpolations  may  be  suggested.  The  earliest  one  is  essentially  a 
Fourier  interpolation.  That  is,  plot  a  multidimaisioiial  vector  as  a  trigonometric  pofynomial  expansion  with 
coefficients  d^rmined  by  the  weights  Xj.  Specifically  Arulrews  suggests  plotting 

/x(&)  =  Xx/\f2-\-Xi  sin(9)  +X3Cos(9)  -l-Xtrin  (26)  +XiCos(20)  4-  •  •  • 


Each  unique  point  gets  mapped  into  a  unique  trigonometric  polynomial.  These  aie  then  plotted  in  a  way  similar 
to  paralld  comdinate  {dots.  Two  properties  of  Andrews  plots  are  interesting.  First,  because  of  the  Fourier 
series  interpretation,  the  classic  Parseval's  Theorem  holds.  Parseval*s  Theorem  basically  has  to  do  with  L2- 
norms  and  asserts  that  mean  square  error  in  the  Fourier  domain  and  mean  square  error  in  the  untransformed 
dmnain  are  the  same.  Thus  idiile  the  untransformed  domain  is  n-dimensional  Euclidian  space,  the  Fourier 
domain  is  2-dimensi(mal  space  so  that  loddng  at  an  Andrews  {dot  we  can  visually  get  an  idea  of  the  mean 
square  error  structure.  The  second  property  relates  to  the  fact  that  we  are  talking  about  orthonormal 
trigonometric  series.  Because  of  this  (thinking  of  the  x-axis  variable  as  an  angle,  say  9),  for  every  6^  we  gel  a 
different  linear  weighting  of  the  X^s.  We  can  think  of  a  slice  at  d  as  a  1-dimensional  projection  of  the 
multivariate  vector  onto  an  axis  whose  orientation  is  determined  by  9.  It  has  been  argued  that  this  in  effect 
gives  us  a  one-dimensional  grand  tour.  As  with  any  grand  tour  this  offers  us  the  possibility  of  looking  for 
orientations  that  show  up  interesting  or  unusual  properties. 


There  is  nothing  inherently  sacred  dxKit  either  the  piecewise  linear  (parallel  coordinate  plot)  or  the  , 
trigonometric  (Andrews  plot)  interpolation.  The  former  is  use^  because  in  preserves  geometric  properties,  the  — — 
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latter  because  of  the  mean  square  interpretation.  The  1-dimensionaI  grand  tour  would  work  with  any 
orthonormal  series  so  there  may  be  some  other  interesting  orthonormal  series  to  think  about.  It  may  be  that  we 
can  invent  series  which  highlight  different  properties  so  that  we  can  have  a  family  of  plots  designed  to  explore 
different  aspects  of  the  structure.  That  is  to  say,  if  we  are  interested  in  highlighting  clustering  or  outliers,  we 
propose  to  invent  an  orthogonal  series  that  would  exaggerate  those  aspects  of  the  data  in  the  plot.  Thus,  we 
could  generalize  the  paraUel  coordinate  and  Andrews  plots. 

Our  work  in  this  area  was  published  in  Wegman  and  Shen  (1993)  and  also  described  in  Wegman,  Carr 
and  Luo  (1993),  and  Wegman  and  Carr  (1993).  One  key  result  was  to  do  an  expansion  in  two  dimensions 
instead  of  just  one.  What  I  have  described  before  is  an  expansion  f{6;  X)  where  X=(Xi,. . .  where  f  is  either 
a  piecewise  linear  interpolant  or  a  trigonometric  series.  I  used  a  bivariate  expansion  say  f(0;  X)  =  (fi(fi), 
f2(0))  as  a  2-dimension^  Fourier  transform  with  irrational  phase  ratio  (or,  in  feet,  any  orthonormal  series).  In 
this  situation  I  was  able  to  preserve  the  Parseval-type  property  and  create  the  two-dimensional  pseudo-grand 
tour.  We  can  tbink  of  a  3-dimensional  plot,  plotting  f  (S)  against  6.  If  the  0  axis  corresponds  to  the  x  axis  and 
7  to  the  y-z  axis,  we  implemented  this  in  our  VR  lab  with  rotation  around  the  x-axis  to  help  visaalize  the  three- 
dimensional  structure.  Having  a  three-dimensional  plot  helps  uncover  more  structure  in  the  data  than  a  simple 
two-dimensional  plot  would.  Moreover,  we  are  able  to  rotate  the  plot  so  that  tte  y-z  axis  is  the  screen  axis. 
Then  slicing  this  graph  along  the  x-axis  would  correspond  to  doing  a  two-dimensitmal  grand  tour.  This 
provided  a  unified  treatment  of  Andrews/parallel-coordinate-type  pl«s  with  the  grand  tour  idea. 


The  Grand  Tour  in  Three  and  Four  Dunensions 

The  grand  tour  is  a  very  interesting  idea  first  and  primarily  eiqiosited  by  Asimov,  and  never  really 
given  its  full  due  we  believe  because  it  is  computationally  intensive  and  technically  feirly  difficult.  The  intuitive 
idea  underlying  the  grand  tour  is  as  its  name  suggests,  if  we  want  to  investigate  a  data  set  we  “lode  at  it  fix>m  all 
angles"  much  as  if  we  were  doing  a  grand  tour  of  a  geographic  place  we  would  try  and  look  at  it  in  all  aspects. 
Thus,  for  exanqile,  if  we  are  exploring  a  ten-dimensional  data  set,  we  would  like  to  look  at  it  firom  as  many 
different  perspectives  as  we  could.  The  original  mathematical  implementation  was  as  projections  into  two- 
dimensional  planes  (fiats).  The  coUectirm  of  all  two-dimensional  fi^  in  an  n-dimensional  Euclidian  space  is 
called  a  fitassmanian  manifold.  The  idea  is  to  create  a  space  filling  path  (i.e.  one  that  visits  all  elements  of  the 
Grassmannian  manifold)  in  some  oontinoous  fetshion  with  the  additional  restriction  that  the  proportion  of  time 
spent  in  each  region  is  proportional  to  the  size  of  that  region.  That  is  to  say  we  do  not  linger  in  a  small  region 
of  the  rriide  space.  If  we  then  think  of  stqqiing  along  this  path,  wp  get  a  series  of  2-dimensional  planes  onto 
which  we  can  project  the  n-dimensional  point  doud.  If  there  is  no  structure  in  the  point  cloud,  then  every  two- 
dimensional  projection  should  lodr  like  an  uncorrelated  scatter  diagrain.  If  there  is  (two-dimensional)  structure, 
then  s(Mne  projections  will  have  interesting  non-trivial  patterns  and  these  can  be  modeled.  Two  problems  arise. 
First,  if  the  dimenskm  of  tlte  data  space  is  high,  then  the  mimber  of  two-flats  needed  to  get  a  reasonably  dense 
collection  two  planes  is  very  large.  This  means  that  in  any  real  implementation  there  is  a  tradeoff  between 
density  of  jdanes  and  reasonable  viewing  time.  However,  if  the  density  of  viewing  idanes  is  feirly  low,  some 
perspectives  will  be  missed  and  consequently  some  interesting  projections  may  be  lost  Second,  the  methods  for 
choosing  the  path  through  the  Grassmannian  manifold  are  either  conqiutationaily  very  tedious  or  not 
mathematically  elegant  md  visually  un^qiealing.  Moreover,  even  if  these  aspects  could  be  dramatically 
improved,  it  is  dear  that  lotting  a  sequence  of  two-dimensional  projections  will  allow  us  to  detect  umisual  two- 
dimensional  patterns,  but  it  will  not  necessarily  allow  for  us  to  ddect  unusual  patterns  in  3  dimensions. 

The  feet  that  we  have  3-dimensional  dispUty  devices  suggests  that  we  could  and  have  tried  creating  a 
grand  tour  in  three  dimensions  The  idea  is  in  an  n-dimensional  space  there  would  be  a  large  number  number 
of  three-dimensional  subspaces.  Instead  of  Stepping  through  a  sequence  of  two-flats,  we  could  step  through  a 
sequence  of  three-flats.  There  are,  of  course,  as  many  two-dimensional  flats  (coordinate  systems)  as  there  are 
three-dimensional  flats  (coordinate  systems)  in  the  sense  that  both  have  the  same  cardinality  and  are 
uncountably  infinite.  Nonetheless,  in  a  practical  implementation,  we  do  not  have  to  step  through  as  many  3-D 
coordinate  tystems  as  2-D  coordinate  systems  in  order  to  densely  afqiroximate  all  possibly  systems.  In  the  two- 
dimensional  grand  tour  we  are  interested  in  determining  two-flats.  These  will  be  determined  by  a  pair  of  unit 
length  vectors,  say  (a,  b),  which  are  orthogonal  and  which  span  a  given  plane.  Of  course,  if  each  of  the 


components  of  b)  contain  only  Os  and  Is,  these  will  correspond  to  planes  of  the  original  coordinate  axes 
system.  Thus  the  2-flat  of  interest  is  span(a,  b).  We  have  achieved  two  important  results;  1)  We  have 
generalized  the  grand  tour  to  general  k-dimensional  representatimis,  i.e.  we  have  created  a  time-dependent 
series  of  orthonormal  vectors  in  k-dimensions,  (ai(t),  »2(0*  •  •  • » *»(!))  (see  description  below)  and,  2)  We  have 
found  a  computationally  efScient  algorithm  for  a  2-dimensional  pseudo-grand  tour  (see  description  above). 
These  results  were  repmted  in  Wegman  (1991b),  Wegman  and  Shen  (1993),  Wegman,  Carr  and  Luo  (1993)  and 
Wegman  and  Carr  (1993). 

Finding  Structure  in  k-Dimensions  using  Grand  Tour  and  Parallel  Coordinates 

The  project  here  is  conceptually  closely  related  to  our  earlier  discussions  of  the  grand  tour.  As 
indicated  earlier,  the  advantage  of  doing  a  3-dimensional  grand  tour  is  two-fold.  First,  it  allows  for  one  to  see 
unusual  3-dimensional  configurations  instead  of  sinqtty  unusual  2-dimensional  configurations.  Second,  it 
allows  a  more  complete  search  of  the  k-dimensional  space  because,  for  practical  purposes,  there  are  fewer  3-flats 
needed  than  2-flats  to  attain  the  same  density.  Because  parallel  coordinates  is  a  convenient  tool  for  representing 
data  in  dimensimis  4  and  higher,  a  natural  suggestion  is  to  combine  the  parallel  coordinate  representation  with 
the  grand  tour  notion.  Generalizing  our  earlier  notion,  sa(qx>se  (ai,  a2«  •  •  •  ,  a^)  is  a  vector  of  k-unit  vectors 
which  form  the  mutualfy  orthogonal  unit  vectors  whose  span  is  a  k-flat.  This  is,  so  to  speak,  a  Grassmannian 
manifold  ci  k-flats  instead  of  2-flats.  The  idea  is  to  find  a  continuous,  ^nse  path  thrmigh  this  Grassmannian 
manifold  and  use  the  k-flats  (k-dimensional  coordinate  system)  so  generated  as  a  sequence  of  coordinate  systems 
in  which  we  can  plot  the  data.  Of  course,  we  would  not  plot  using  C^aitesian  coordinates,  but  we  can  plot  using 
parallel  coordinates.  Again  we  would  be  searching  for  unusual  structure.  One  structure  that  w<^d  be  of 
interest  finding  that  the  data  lie  on  mie  or  more  k-flats  or  other  k-manifolds.  For  ecample,  verifying  that  the 
data  were  co-(k-)planar  in  some  orientaticm  of  a  k-fiat  would  essentially  suggest  that  a  multiple  linear  regression 
with  1  dependent  and  (k  -  1)  independent  variables  is  an  aiqnopriate  model.  Other  stnu^ires  would  suggest 
other  models. 

The  trmle-off  is  obvious.  As  k  gets  larger,  the  ability  to  look  for  unusual  higher  dimensional  structure 
improves.  Also  the  density  of  k-flats  is  much  high  than  the  density  rf  2-  or  3-flats  and  so  it  a|^}ears  plausible 
that  we  could  look  more  closely  at  the  n-dimensional  space.  The  bad  news  is  that  the  conqmtation  of  the  unit 
vectors  (ai,  a2,  ■  ■  ■  »  at)  is  likely  to  become  computationally  more  intensive.  How  bad  this  might  be  is  not  yet 
clear. 


Let  us  consider  a  related  observation.  We  know  that  lines  in  parallel  coordinates  represent  points  in 
Euclidian  space  and  similarly,  points  in  parallel  coordinates  rqtresent  lines  in  Euclidian  space.  Suppose  we 
have  a  bunch  of  points  in  Euclidian  qtace  chosen  randomly  except  that  they  all  lie  on  a  plane,  say  a  d-flat.  They 
are  rqnesented  by  a  coUecficm  of  line  segments  joining  parallel  cocudinate  axes.  Let’s  let  the  ith  point  in 
Euclithan  n-^ce  be  represented  by  C  and  let  the  line  between  axis/  and/  4-  7  be  j  =  1,  2, . . .  ,  n  -  1.  The 
intersection  of  and  dj  is  a  point  in  parallel  coordinate  space  representing  a  line  in  Einlidian  space  denote  it 
by  Vf.  Joining  this  to  ^*1  gives  us  a  new  line  segment  in  parallel  coordinate  iq>ace,  say  C^,  which  represents 
a  point  in  Euclidian  n-space.  Since  the  lines  rq>resented  by  7^*  are  coplanar,  their  intersections  rq}resented  by 
are  also  coplanar.  This  inqtlies  that  all  of  the  segments  Cf  should  have  a  common  intersections  as  j  ranges 


from  1  to  n.  Indeed,  if  there  is  not  one  but  several  intersections  for  each  j,  this  suggests  that  there  are  not  one 
but  several  planes.  Generalizing  this  process  to  higher  dimensions  this  suggests  another  diagnostic  tool  for 
detecting  wten  a  point  cloud  lies  on  one  or  more  k-flats.  Cfonpled  with  the  k-dimensional  grand  tour,  this  may 
be  a  veiy  powerful  geometric  diagnostic  tool  for  inferring  data  structure  in  higher  dimensions. 


A  related  problem  is  to  diagnose  nonlinear  structure.  If  we  have  data  on  a  nonlinear  k-manifold,  the 
given  technique  may  not  be  entiiefy  appnqiniate.  This  technique  is  foirly  robust  to  variability  to  some  scatter  off 
of  the  plane  (i.e.,  when  dealing  with  a  thick  sl^).  If  so,  then  a  k-manifold  niiich  has  small  to  moderate 
curvature  may  be  regarded  as  a  modestly  thidc  slab  and  although  the  will  not  have  exactly  common 
intersections  the  intersections  should  cluster  tightly.  The  idea  is  then  to  introduce  nonlinear  transformations  of 
the  data  and  look  at  the  plot  rtf'  the  intersections  as  a  graphical  tool  for  diagnosing  how  well  the  transformation 
is  linearizing  the  data  fit.  Of  course,  if  the  k-inanifold  is  highly  curved,  there  may  not  be  any  indication  of 


planarity.  This  work  was  rqwrted  in  Wegman  (1991b)  and  has  been  coded  into  a  software  package  titled 
ExplorN  which  is  co-authored  by  Cart,  Luo,  Wegman  and  Shen. 

Structural  Inference  using  Ridge  Estimation  in  Hyperspace 

This  ptxtblem  arises  firom  an  attempt  to  abstract  the  general  idea  of  nonparametric  regression.  The  idea 
of  regression,  of  course,  is  that  there  is  a  le^nse  variable,  say  Y,  and  one  or  more  predictor  variaWes,  s^  Xi , 

. . .  ,  X<i.  In  regression  we  attempt  to  find  a  fimction,  say  f,  so  that  Y  is  aj^roximated  by  ftXi, . . .  ,  Xj)  in  some 
sense,  usually  least  squares.  This  gives  the  random  variable  Y  some  sort  of  i»eferred  status  over  the  variables 
Xi, . . .  ,  Xd-  This  may  or  may  not  be  appropriate.  We  can  however  think  of  the  variables  Y,  Xi, . . .  ,  Xj  as  a 
vector  which  describe  a  point  in  a  d+1  ^ensional  space.  These  points  satisfy  some  functional  relationship, 
that  is  there  exists  a  function,  say  F,  such  that  F(Y,  Xi, . . .  ,  Xi)  =  0.  Another  way  of  thinking  about  this  is 
geometrically,  i.e.  Sit  =  {(Y,  Xi, . . .  ,  Xi);  F(Y,  Xi, . . .  ,  Xi)  =  0}  is  some  sort  of  liypersurface  of  dimension 
k  embedded  in  a  d  dimensional  space.  To  make  this  concrete  by  an  example,  let  d  =  2,  k  =  1  and  F(Y, 
X)  =  Y  -  sin(X).  Then  the  points  (Y,  X)  in  971  are  exactly  the  points  in  two  dimensions  lying  on  the 
Y  =  sin(X)  curve.  971  is  a  one-dimensional  set  in  a  two-dimensio^  space.  Because  we  are  dealing  with 
random  variables  we  cannot  expect  the  points  to  lie  exactly  on  the  bypersur&ce,  971,  (technically  971  should  be 
called  an  algdiraic  manifold),  twt  to  be  scattered  off  of  it  Thus  we  sli^d  really  think  about  F(Y,  Xi, . . .  ,  Xj) 
=  c  so  that  taking  expected  values  we  find  that  971  =  {(Y,  Xi, . . .  ,  Xi):  E  EfY,  Xi, . . .  ,  Xi)  =  0)  is  the 
manifold  we  would  like  to  estimate.  Notice  in  the  regression  case  if  we  let  F(Y,  Xi, . . .  ,  Xd)  =  Y  -  f(Xi, . . .  , 
Xd),  then  F(Y,  Xi, . . .  ,  Xi)  =  e  corresponds  to  Y  =  f(Xi, . . .  ,  Xi)+c.  In  gen^,  in  this  description  we  have 
left  Y  in  to  draw  the  analogy  to  usual  regressioit  but  Y  is  not  intended  to  have  a  preferred  status.  Thus  fiom 
now  on  we  shall  simply  consider  F(Xi, ...  ,  Xi)  and  define  971  as  {(Xi,  . . .  ,  Xd):  E  F(Xi, . . .  ,  Xd)  =  0}. 
Thus  finding  the  functional  relationship  among  the  Xs  (i.e.  in  my  language  structural  inference)  is  equivalent  to 
estimating  the  manifold,  971.  Since  971  is  a  geometric  structure  in  l^perspace,  we  have  the  potential  of 
visualizing  it  through  some  our  gnqdiical  techniques. 

We  suggest  a  connection  with  probability  densities.  Consider  a  plot  of  a  two-dimensional  normal 
density.  In  general  this  will  be  a  sur&ce  in  three  dimensions.  If  we  try  to  think  of  the  be^  zero-dimensional 
summary  of  the  (tensity  most  petite  would  probably  suggest  the  mean.  Since  the  mean  and  mode  <ff  the  normal 
are  co-ltKated,  this  would  also  be  the  mode.  Let  me  use  language  which  suggests  a  solution  for  higher 
dimensions.  The  best  zero-dinensioiial  summary  is  locatitm  of  mode  which  is  tie  projection  of  the  maximal 
zero-dimensional  manifold  <m  the  surfiee  of  the  density.  If  we  try  and  think  of  the  best  <me-dimensional 
summary,  think  of  the  ftet  that  slices  of  the  density  parallel  to  the  X  -  Y  have  elliptical  cross  section  with  a 
majm  axis  and  a  minor  axis,  (^erationally  we  would  probably  want  to  choose  our  summary  as  the  major  axis 
of  the  density.  Notice  if  the  cross  section  were  circular,  correlatitm  would  be  1  and  there  would  be  no  difference 
between  major  arul  minor  axes.  Basically  it  would  not  make  sense  to  talk  about  a  best  <me-dimensional 
summary,  however,  the  correlation  were  plus  or  minus  1,  the  minor  axis  would  have  zero  length  and  the 
major  axis  wcaild  coincide  with  the  usual  regression  line.  (Because  of  perfect  (xnrelation  there  would  be  no 
scatter  off  of  this  line.)  If  we  think  of  the  ridge  on  the  (tensity  surfece  (ridge  in  the  intuitive  sense  like  on  a 
mountain  or  hill),  the  major  axis  will  lie  beneath  this  ridge.  In  some  sense,  the  ridge  we  have  just  described  is 
the  maximal  one  dimenaonal  maniftdd  on  the  surface  of  the  two-dimensional  density.  The  best  1-dimensional 
manifold  estimate  is  the  support  of  the  ridge,  i.e.  the  closure  (ff  the  set  of  points  fin  which  the  ridge  is  positive. 
The  idea  in  general  is  to  fi^  the  maximal  k-dimensional  manifold  on  the  d-dimensitmal  surfece  of  the  density 
which  we  will  define  as  the  k-ridge.  The  k-skeleton  is  k-dimensional  manifold  which  is  the  support  of  the  k- 
ridge  in  the  d-dimensional  qitace.  The  research  problems  was  to  construct  a  suitable  dilution  of  the  k-ridge 
and  to  construct  reasonable  estimators.  A  potentially  reasonable  estimation  procedure  for  the  k-ridge  is  to 
estimate  the  prob^ility  density  fimction  and  find  the  maximal  k-ridge  on  it.  Another  element  of  the  research 
was  to  implement  a  3-(limensional  surfece  projection  of  the  k-^el^on  for  k  =  2  or  3  either  on  the  Silicon 
Graphics  machine  using  our  VR  immersive  technology.  The  0-dceleton  is  the  rntxle.  These  other  estimators  are 
multidimensional  analogues  of  the  nuxte.  This  work  has  been  reported  in  Wegman,  Carr  and  Luo  (1993)  and  in 
numerous  invited  jnesentatioa  The  conq>leted  research  will  form  the  sifestance  of  the  dissertatitm  of  (hu  PIlD. 
student,  Qiang  Loo.  Mr.  Luo  will  be  awarded  his  Ph.D.  in  May,  1995.  The  work  has  also  been  made  available 
in  software  entitled,  MasonRu^e,  authored  Ity  Luo  ai^  WegmaiL 


Other  Work 

The  four  tq)ic  areas  described  above  were  the  topics  outlined  in  the  research  proposal  upon  which  the 
award  was  maHa  However,  there  have  been  an  extensive  amount  of  additional  work  produced  under  this 
contract.  This  additional  work  generally  fells  into  the  categories  of:  1)  nonparametric  density  and  function 
estimation  (Le  and  Wegman,  1991;  Miller  and  Wegman,  1991;  Heame  and  Wegman,  1991;  Heame  and 
Wegman,  1992;  Le  and  Wegman,  1993,  Heame,  1994;  Marchette  et  al.  1994;  Solka  et  al.  1994a;  Heame  and 
Wegman,  1994  and  Solka  et  al.  1994b),  2)  parallel  and  high  performance  computing  in  statistics  (Wegman, 
1991a;  Xu,  Miller  and  Wegman,  1991;  Sullivan  and  Wegman,  1994;  Poston  a^  Solka,  1994;  Wegman  and 
Jones,  1994;  Takacs,  Wegman  and  Wechsler,  1994;  Fauntleroy  and  Wegman,  1994;  Wegman,  1994;  Sullivan, 
1994;  and  Sullivan  and  Wegman,  1995),  3)  stochastic  modeling  (  Wegman  and  Habib,  1992;  and  Chow,  1994) 
and,  finally,  4)  historical  (Wegman,  1992;  Wegman,  1993). 
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